Self-Organizing Middleware for Extreme Heterogeneity: The Role of Technology- Oblivious Machine Learning

نویسندگان

  • Ronald F. DeMara
  • Rizwan A. Ashraf
چکیده

Extreme Heterogeneity (EH) resulting from diverse processors, accelerators, and memory devices impose urgent challenges to support scientific workflows in the post-Moore era. The position taken is that in-the-loop runtime machine learning and big-data approaches can offer feasible and efficient mechanisms to manage vast resources by realizing a thin autonomous layer of control and optimization. Research on autonomous middleware, residing between diverse hardware resources and existing system software primitives, is worthy of community discussion towards developer-transparent support for EH challenges. Furthermore, by prioritizing a self-aware, selforganizing, and self-optimization perspective, a thin middleware layer could be readily-transportable despite changes to the underlying computing stack. Overall, it will be advocated such an approach could transform EH from a “formidable foe of HPC” into a “valuable ally of HPC” on an incremental path to 2025-2040. Keywords—Runtime Systems for Extreme Heterogeneity; Machine Learning; Data Clustering; Fault-Propagation Classifier; Correctness Despite Silent Data Corruption; EnergyAware High Performance Computing; Runtime Dynamic Resource Optimization. I. CHALLENGES & PERSPECTIVE OF EXTREME HETEROGENEITY This position paper advocates the potential of online and offline machine-learning approaches to provide a feasible yet high-payoff computer science research direction to address the following Extreme Heterogeneity (EH) Challenges: C1: efficient resource utilization across disparate computing paradigms/components including multi/manycore CPU, GPU, TPU, FPGA, and future neuromorphicbased accelerators, C2: mitigating the perils of Silent Data Corruption (SDC) propagation that are not neglectable in HPC applications at the vastness of post-exascale component cardinalities, and C3: intractability for human optimization of resilience vs. energy tradeoffs [4], whereby reducing supply voltage or cooling to save energy may actually increase energy consumption, because running time is lengthened due to errors impacting convergence. Research Direction: Unsupervised and semi-supervised learning offer promising research approaches to EH challenges. Large-scale analysis of operational measures and vulnerability metrics could provide a wealth of orthogonal input data to machine learning approaches. Aided by its diversity, such data empowers runtime and offline learning strategies developed to attain preestablished {throughput, resilience, energy} objectives [6][8][11]. These could be researched for the execution behavior of individual jobs first and then layered later for a typical multiuser scenario. State of the Art: Energy-aware HPC methods have an extensive basis, as do various methods for achieving sufficient correctness despite SDC during execution. However, the management of both of these goals simultaneously as competing objectives remains an urgent, vital, and open research issue. Moreover, EH will act to further exacerbate their interaction and intractability to manage without some form of machine learning support. It is promulgated herein that feasible extensions to existing infrastructures for runtime dynamic resource

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of the Extreme Learning Machine for Modeling the Bead Geometry in Gas Metal Arc Welding Process

Rapid prototyping (RP) methods are used for production easily and quickly of a scale model of a physical part or assembly. Gas metal arc welding (GMAW) is a widespread process used for rapid prototyping of metallic parts. In this process, in order to obtain a desired welding geometry, it is very important to predict the weld bead geometry based on the input process parameters, which are voltage...

متن کامل

Modeling Discharge Coefficient of Side Weir on Converging Channel Using Extreme Learning Machine

In this study, the discharge coefficient of side weirs located on converging channels was simulated for the first time using a new method of Extreme Learning Machine (ELM). To examine the accuracy of the numerical model, the Monte Carlo simulations were used and the experimental values validation was conducted by the k-fold cross validation method. Then, the input parameters were detected for s...

متن کامل

The Time Adaptive Self Organizing Map for Distribution Estimation

The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...

متن کامل

A Hybrid Machine Learning Method for Intrusion Detection

Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...

متن کامل

Simulation of Scour Pattern Around Cross-Vane Structures Using Outlier Robust Extreme Learning Machine

In this research, the scour hole depth at the downstream of cross-vane structures with different shapes (i.e., J, I, U, and W) was simulated utilizing a modern artificial intelligence method entitled "Outlier Robust Extreme Learning Machine (ORELM)". The observational data were divided into two groups: training (70%) and test (30%). Then, using the input parameters including the ratio of the st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018